12 research outputs found

    Scalable kernelization for the maximum independent set problem

    Get PDF

    Communication Efficient Algorithms for Distributed OLAP Query Execution

    Get PDF
    As a result of the growing amounts of Data in todays Databases, one machine is often not sufficient to store and process these. The proper solution to this problem is to scale the system out on a cluster. However, the distribution of the data throughout the machines of the cluster results in a high percentage of communication time in the overall execution time of a query, especially for complex analytical queries. For this reason, we try to minimize the volume of communicated data to allow faster runtimes when a query cannot be executed on a single node of the cluster without any communication. We analyze techniques from previous work and propose improvements to them backed by a complexity analysis of the communication volume for both, our algorithms and the algorithms from the previous work. For the evaluation of our algorithms we implement them for chosen queries of the TPC-H benchmark and run them on a cluster of up to 128 nodes with a database of up to 30 terabytes of uncompressed data (128 TB if only a small proportion of the database is used). We provide both, scaling experiments and runtime comparisons to previous work and the current TPC-H record holder. The main contributions of this work are: • A technique to find a better partitioning of the tables in a database to allow the execution of joins without communication effort • An algorithm that selects the first k tuples of the result set of a query with a communication effort independent from the size of the database, given certain conditions of the partitioning • An analysis of the communication effort of a delayed join that can’t be evaluated locally on a node, in comparison to the communication effort when executing the join early • The application of our algorithms to solve complex queries of the TPC-H benchmark that can’t be executed without a high amount of communication effort • The implementation of the queries in a prototype and evaluation of our algorithms on a large cluster consisting of 128 nodes for a database with up to 30 terabytes of uncompressed data (or 128 TB if only a small proportion of the database is used

    Scalable Kernelization for Maximum Independent Sets

    Get PDF
    The most efficient algorithms for finding maximum independent sets in both theory and practice use reduction rules to obtain a much smaller problem instance called a kernel. The kernel can then be solved quickly using exact or heuristic algorithms---or by repeatedly kernelizing recursively in the branch-and-reduce paradigm. It is of critical importance for these algorithms that kernelization is fast and returns a small kernel. Current algorithms are either slow but produce a small kernel, or fast and give a large kernel. We attempt to accomplish both of these goals simultaneously, by giving an efficient parallel kernelization algorithm based on graph partitioning and parallel bipartite maximum matching. We combine our parallelization techniques with two techniques to accelerate kernelization further: dependency checking that prunes reductions that cannot be applied, and reduction tracking that allows us to stop kernelization when reductions become less fruitful. Our algorithm produces kernels that are orders of magnitude smaller than the fastest kernelization methods, while having a similar execution time. Furthermore, our algorithm is able to compute kernels with size comparable to the smallest known kernels, but up to two orders of magnitude faster than previously possible. Finally, we show that our kernelization algorithm can be used to accelerate existing state-of-the-art heuristic algorithms, allowing us to find larger independent sets faster on large real-world networks and synthetic instances.Comment: Extended versio

    Enabling Scalability: Graph Hierarchies and Fault Tolerance

    Get PDF
    In this dissertation, we explore approaches to two techniques for building scalable algorithms. First, we look at different graph problems. We show how to exploit the input graph\u27s inherent hierarchy for scalable graph algorithms. The second technique takes a step back from concrete algorithmic problems. Here, we consider the case of node failures in large distributed systems and present techniques to quickly recover from these. In the first part of the dissertation, we investigate how hierarchies in graphs can be used to scale algorithms to large inputs. We develop algorithms for three graph problems based on two approaches to build hierarchies. The first approach reduces instance sizes for NP-hard problems by applying so-called reduction rules. These rules can be applied in polynomial time. They either find parts of the input that can be solved in polynomial time, or they identify structures that can be contracted (reduced) into smaller structures without loss of information for the specific problem. After solving the reduced instance using an exponential-time algorithm, these previously contracted structures can be uncontracted to obtain an exact solution for the original input. In addition to a simple preprocessing procedure, reduction rules can also be used in branch-and-reduce algorithms where they are successively applied after each branching step to build a hierarchy of problem kernels of increasing computational hardness. We develop reduction-based algorithms for the classical NP-hard problems Maximum Independent Set and Maximum Cut. The second approach is used for route planning in road networks where we build a hierarchy of road segments based on their importance for long distance shortest paths. By only considering important road segments when we are far away from the source and destination, we can substantially speed up shortest path queries. In the second part of this dissertation, we take a step back from concrete graph problems and look at more general problems in high performance computing (HPC). Here, due to the ever increasing size and complexity of HPC clusters, we expect hardware and software failures to become more common in massively parallel computations. We present two techniques for applications to recover from failures and resume computation. Both techniques are based on in-memory storage of redundant information and a data distribution that enables fast recovery. The first technique can be used for general purpose distributed processing frameworks: We identify data that is redundantly available on multiple machines and only introduce additional work for the remaining data that is only available on one machine. The second technique is a checkpointing library engineered for fast recovery using a data distribution method that achieves balanced communication loads. Both our techniques have in common that they work in settings where computation after a failure is continued with less machines than before. This is in contrast to many previous approaches that---in particular for checkpointing---focus on systems that keep spare resources available to replace failed machines. Overall, we present different techniques that enable scalable algorithms. While some of these techniques are specific to graph problems, we also present tools for fault tolerant algorithms and applications in a distributed setting. To show that those can be helpful in many different domains, we evaluate them for graph problems and other applications like phylogenetic tree inference

    Targeted Branching for the Maximum Independent Set Problem

    Get PDF
    Finding a maximum independent set is a fundamental NP-hard problem that is used in many real-world applications. Given an unweighted graph, this problem asks for a maximum cardinality set of pairwise non-adjacent vertices. In recent years, some of the most successful algorithms for solving this problem are based on the branch-and-bound or branch-and-reduce paradigms. In particular, branch-and-reduce algorithms, which combine branch-and-bound with reduction rules, have been able to achieve substantial results, solving many previously infeasible real-world instances. These results were to a large part achieved by developing new, more practical reduction rules. However, other components that have been shown to have a significant impact on the performance of these algorithms have not received as much attention. One of these is the branching strategy, which determines what vertex is included or excluded in a potential solution. Even now, the most commonly used strategy selects vertices solely based on their degree and does not take into account other factors that contribute to the performance of the algorithm. In this work, we develop and evaluate several novel branching strategies for both branch-and-bound and branch-and-reduce algorithms. Our strategies are based on one of two approaches which are motivated by existing research. They either (1) aim to decompose the graph into two or more connected components which can then be solved independently, or (2) try to remove vertices that hinder the application of a reduction rule which can lead to smaller graphs. Our experimental evaluation on a large set of real-world instances indicates that our strategies are able to improve the performance of the state-of-the-art branch-and-reduce algorithm by Akiba and Iwata. To be more specific, our reduction-based packing branching rule is able to outperform the default branching strategy of selecting a vertex of highest degree on 65% of all instances tested. Furthermore, our decomposition-based strategy based on edge cuts is able to achieve a speedup of 2.29 on sparse networks (1.22 on all instances)

    ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms

    Get PDF
    Fault-tolerant distributed applications require mechanisms to recover data lost via a process failure. On modern cluster systems it is typically impractical to request replacement resources after such a failure. Therefore, applications have to continue working with the remaining resources. This requires redistributing the workload and that the non-failed processes reload data. We present an algorithmic framework and its C++ library implementation ReStore for MPI programs that enables recovery of data after process failures. By storing all required data in memory via an appropriate data distribution and replication, recovery is substantially faster than with standard checkpointing schemes that rely on a parallel file system. As the application developer can specify which data to load, we also support shrinking recovery instead of recovery using spare compute nodes. We evaluate ReStore in both controlled, isolated environments and real applications. Our experiments show loading times of lost input data in the range of milliseconds on up to 24576 processors and a substantial speedup of the recovery time for the fault-tolerant version of a widely used bioinformatics application

    Pareto Sums of Pareto Sets

    Get PDF
    corecore